Text Reader for Visually Impaired Person using Image Processing/Open-CV

Authors: Aparna.V. Mote, Rutuja Akhare, Vaishnavi Barde, Gitanjali Bhujbal

DOI Link: https://doi.org/10.22214/ijraset.2023.53004

Abstract

The main issue that visually impaired individuals confront these days is that they are unable to do text recognition on their own, forcing them to rely on others for day-to-day tasks such as reading newspapers, writing letters, referring to books, and so on. This issue may erode their confidence because they are unable to cope on their own. The project\'s ultimate goal is to assist visually challenged persons with text recognition. This goal is accomplished by creating a module that converts text into speech and speaks it through the provided headphone/speaker. The code is written in Python after importing pytesseract and gtts. For character recognition, this project employs the concept of image processing and the OCR approach.

Introduction

I. INTRODUCTION

The project's ultimate goal is to assist visually challenged persons in recognizing text. When a printed text is displayed in front of the web cam, it must capture the image, extract the text from the image, and read out the text via computer audio or headphone. The code is written in Python after importing pytesseract and gtts. For character recognition, this project employs the concept of image processing and the OCR approach. The main issue that visually impaired persons have these days is that they are unable to do text recognition on their own, forcing them to rely on others for day-to-day tasks such as reading newspapers, writing letters, and referring to websites referring books etc. This problem may reduce their confidence as they could not withstand independently. The project's ultimate goal is to assist visually challenged persons with text recognition. This goal is accomplished by creating a module that converts text into speech and speak through the provided headphone/speaker. The image is captured using the system's webcam, and the text is extracted using the built-in application. The text is then identified for words and spoken out loud using headphones or the system's audio. The Python programming language offers (Python Imaging Library), which is used to do simple image operations such as creating thumbnails, resizing, rotating, and converting between different file formats.

II. OBJECTIVE

This project is designed to overcome Braille problem using IoT technology. This Project is built using a small size and low cost single board computer, named Raspberry Pi. The visual data is sent to the single board computer using WiFi connection. The image is processed to perform image to text conversion and text to voice conversion using available converters from the online site.Book reader will capture the picture of book pages using a camera and then process the images using OCR software. When the image is recognized, book reader will read it aloud. Thus, the blind people or those who have low vision will hear it without the need to touch using their fingertips. Book Reader will read aloud a book without need to touch like braille.

This System has following Modules:

Requirements Planning
Pre-processing
Character recognition
Development
Text to Speech Synthesis

III. LITERATURE SURVEY

OCR based facilitator for the visually challenged. The paper encouraged us to do this project. From this paper we got to know that there are many people who are facing the BVI problem. Also this paper gave us brief idea about OCR technology and the implementation details which were very useful.[2] We found this as reference and have tried to approach in a efficient way.

Smart Reader for Visually Impaired People Using Raspberry Pi: This paper propose that how to convert image into text and text into audio. Also this system give complete information about hardware and software implementation for blind reader.

The software Implementation and programming along with the details of ocr engine were very useful from this paper. This paper gave the detail information about which engines to be used for image to text conversion and text to speech.“OCR based automatic book reader for the visually impaired using Raspberry PI”–This paper provided the case study and from this paper we learn to build a system on English language, and we were able to think that in other language can also be done, which we put it in advancement[3]. The system accepts a page of printed text with English numerals, scans it into a digital document which is then subjected to skew correction, segmentation, before feature extraction to perform classification. Once classified, the text is read out by a text to speech conversion unit. An innovative, efficient and real-time cost beneficial technique that enables user to hear the contents of text images instead of reading through them as been introduced.[6] It combines the concept of Optical Character Recognition (OCR) and Text to Speech Synthesiser (TTS) in Raspberry pi. Text Image using Raspberry Pi”. Optical Character recognition is used to digitize and reproduce texts that have been produced with non computerized system. Digitizing texts also helps reduce storage space[7].

Design and implementation of Automatic Scene text detection and recognition system for visually impaired people has been discussed. Combining different techniques for Text detection and extraction results into accurate and better system than using single technique for overall system. Text recognition is successfully performed using pattern matching technique. After successful recognition, text is converted into audio output. A prototype system to read printed text and hand held objects for assisting the blind people is proposed.[9] To extract text regions from complex backgrounds, novel text localization algorithm based on models of stroke orientation and edge distributions is adopted. An image to speech conversion technique using Raspberry Pi was implemented. Output has been tested using different samples. The algorithm successfully processes the image and reads it out clearly.

IV. METHODOLOGY

The proposed system is a kind of software module that takes input using the system’s inbuilt camera or a webcam and extract the text content using the code developed and convert the text to speech and read it out using the headphone/ webcam. This project removes the usage of raspberry pi board which is considered as one of the greatest advantages of the proposed system board. Speech and text are the main medium for human communication. A person needs vision to access the information in a text. However, those who have poor vision can gather information from voice. This paper proposes a camera based assistive text reading to help visually impaired person in reading the text present on the captured image. The proposed idea involves text extraction from scanned image using Tesseract Optical Character Recognition (OCR) and the image is read using the open cv2 provided by python library and converting the text to speech by gtts (Google Text To Speech) which translates the text to speech., a process which makes visually impaired persons to read the text. This is a prototype for blind people to recognize the products in real world by extracting the text on image and converting it into speech. Proposed method is carried out by using the installation of a software thus makes it more portable and less expensive. Optical character recognition (OCR) systems provide persons who are blind or visually impaired with the capacity to scan printed text and then have it spoken in synthetic speech or saved to a computer file. There are three essential elements to OCR technology—scanning, recognition, and reading text. The data that we collect or generate is mostly raw data, i.e. it is not fit to be used in.

V. MODELING AND ANALYSIS HARDWARE REUIREMENT:

Webcam/ inbuilt camera with the system.
Headphone / speaker.

A. Software Requirement

Python 3.8.1
Import pytesseract , gtts , os.

B. Technologies Used

Image processing
OCR technique(OpticalCharacter Recognition)
GTTS (Google Text To Speech Converter) IMAGE

C. Processing

Image processing library mainly focused on real-time computer vision with application in wide-range of areas like 2D and 3D feature toolkits, facial & gesture recognition, Human-computer interaction, Mobile robotics, Object identification and others.

The image processing is done using the library open CV2.To perform basic operations on images like create thumnails,resize, rotation, convert between different file formats etc we use PIL . the image is loaded directely using the open () function on Image class. This returns an image object that contains the pixel data for the image as well as details about the image .The format property on the image will report the image format(e.g png, jpeg), the mode will report the pixel channel format (e.g. CMYK or RGB) and the size will report the dimensions of the image in pixels (e.g. 400*260).The show() function will display the image using operating systems default application. One of the most popular and considered as default library of python for image processing is Pillow. Pillow is an updated version of the Python Image Library or PIL and supports a range of simple and advanced image manipulation functionality. It is also the basis for simple image support in other Python libraries such as SciPy and Matplotlib. OCR TECHNIQUE: Optical character recognition or optical character reader (OCR) is the electronic or mechanical conversion of images of typed, handwritten or printed text into machine-encoded text, whether from a scanned document, a photo of a document, a scene-photo or from subtitle text superimposed on an image.It deals with recognizing text from the image files and storing it into a text file. Here, we process the images and convert it into text. Once we have the text as a string variable, we can do any processing on the text.Optical Character Recognition involves the detection of text content on images and translation of the images to encoded text that the computer can easily understand. An image containing text is scanned and analyzed in order to identify the characters in it. Upon identification, the character is converted to machine-encoded text. The image is now split into zones identifying the areas of interest such as where the images ortext are and this helps kick off the extraction process. The areas containing text can now be broken down further into lines and words and characters and now the software is able to match the characters through comparison and various detection algorithms. The final result is the text in the image that we're givenThe fundamental information gathered from web sources is still presented in its unprocessed state as statements, numbers, and qualitative phrases. There are mistakes, omissions, and discrepancies in the raw data. After carefully examining the filled questionnaires, modifications are necessary. Processing the primary data involves the subsequent processes. Field surveys generate a tremendous amount of raw data, which must be classified according to the similarity of the individual responses. Data preprocessing is a method for transforming unclean data into clean data sets. In other words, anytime data are collected from several sources, they are combined into a raw format that is not useful for analysis. As a result, specific actions are taken to reduce the data to a manageable and clean collection.

This technique is performed before the execution of Iterative Analysis. These set of steps is known as Data Preprocessing. After this it includes Data Cleaning, Preprocessing, Feature Extraction, Classification.

Two Modules: User and Doctor are been developed. The incremental build model is a method of software development where the product is designed, implemented, and tested incrementally (a little more is added each time) until the product is finished. It involves both development and maintenance.

D. Google Text To Speech Converter

gTTS (Google Text-to-Speech), a Python library and CLI tool to interface with Google Translate's text-to-speech API. There are several APIs available to convert text to speech in python. One of such APIs is the Google Text to Speech API commonly known as the gTTS API. gTTS is a very easy to use tool which converts the text entered, into audio which can be saved as a mp3 file.

The gTTS API supports several languages including English, Hindi, Tamil, French, German and many more. The speech can be delivered in any one of the two available audio speeds, fast or slow. However, as of the latest update, it is not possible to change the voice of the generated audio.

VI. EXISTING SYSTEM

In the running world there is a growing demand for the users to convert the printed documents in to electronic documents for maintaining the security of their data. Hence the basic OCR system was invented to convert the data available on papers in to computer process able documents, So that the documents can be editable and reusable. The existing system/the previous system of OCR on a grid infrastructure is mostly based on expensive and complex hardware setup. This leads to cost of system and uses become limited based on availability and affordability of blind person. Then the images are refined in order to eliminate any noise that is present in it. A feature called segmentation is used in order to separate each character from other in the text. Graphical details such as icons or logos, if any, are eliminated. Each obtained character is compared with the datasets that are created as a part of the Tesseract library. The Tesseract OCR is the most efficient algorithm available that checks for the obtained character in ten dimensions. Once, the character is recognized, it must be made available as an audio output. For this, we use a software called festival. The festival is used to provide the audio output for the recognized character. Apart from these features, an extra feature is added, that enables the blind to know the type of object that he/she interacts with. (a menu, newspaper and the like). An ultrasonic sensor is included as a part of the project, that makes the project obtain characters only within a particular distance.

VII. PROPOSED WORK

The suggested system is a software module that accepts input from the user extracts the text content using the code produced, converts the text to speech, and reads it out using the headphone/speaker. Our System can read multiple languages e.g. Hindi , English, Russian etc. This project eliminates the use of the Raspberry Pi board, which is regarded as one of the most significant advantages of the suggested system board. The proposed idea entails extracting text from a scanned image using Tesseract Optical Character Recognition (OCR), reading the image with the open cv2 provided by the Python library, and converting the text to speech using gtts (Google Text To Speech), a process that allows visually impaired people to read the text. The proposed solution is carried out simply installing software, making it more portable and less expensive. Those who are blind or visually handicapped can use optical character recognition (OCR) equipment to scan printed text and have it read in synthetic speech or saved to a computer file. The data we collect or generate is generally raw data, which means it cannot be used directly in applications for a variety of reasons. As a result, we must first examine it, then do the Necessary processing, and then use it modeling and analysis.

IX. FUTURE SCOPE

Integration with Mobile Device: As Smartphones and tablets become more powerful integrating text readers with mobile devices will become easier. This will allow visually impaired persons to Access text from wider range of sources, including social media, e-books, and websites.
Multi-language Support: With globalization, the need for text readers that can support multiple languages will continue to grow. Using OpenCV, it is possible to train Text recognition models for various languages.
Improved Accessibility in public Spaces: Public spaces can be challenging for visually impaired person to navigate.The use of text readers in signage, for example, could make it easier for them to access information.

Conclusion

As a result, the suggested system\'s ultimate goal has been met. This technology can translate speech to text and serve as a text reader for the visually handicapped. The text is displayed in front of the system\'s webcam or in front of the integrated camera. The Image Processing technique is used to examine the acquired image. OCR (Optical Character Recognition) separates and identifies the words in the image to recognise the characters. Consequently, the words acquired are transformed to speech using GTTS (google text to speech converter). Lastly, the collected text is read out via the speaker or headphones. As a result, visually handicapped persons benefit from the environment.

References

[1] Bindu Philip and r. d. Sudhaker Samuel 2009 “Human machine interface- a smart ocr for the visually challenged” International journal of recent trends in engineering, vol no.3, November. [2] K Nirmala Kumari, Meghana Reddy J [2016]. Image Text to Speech Conversion Using OCR Technique in Raspberry Pi. International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering Vol. 5, Issue 5, May 2016. [3] V. Ajantha devi, dr. Santhosh baboo “Embedded optical character recognition on tamil text image using raspberry pi” international journal of computer science trends and technology (ijcst)” volume 2 issue 4, jul- aug 2014 [4] Jaiprakash verma, khushali desai “Image to sound conversion” International journal of advance research. [5] R. Mithe, S. Indalkar and N. Divekar. “ Optical Character Recognition\" International Journal of Recent Technology ” [6] Character Detection and Recognition System for Visually Impaired People by Akhilesh A. Panchal, Shrugal Varde, M.S. Panse . [7] Giudice, N. A., & Legge, G. E, Blind navigation and the role of technology. In A. Helal, M. Mokhtari & B. Abdulrazak (Eds.), Engineering handbook of smart technology for aging, disability, and independence (pp. 479- 500): John Wiley & Sons. [8] Sunil Kumar, Rajat Gupta , Nitin Khanna, SantanuChaudhury and Shiv Dutt Joshi, Text Extraction and Document Image Segmentation Using Matched Wavelets and MRF Model, IEEE Transactions on Image Processing ( Volume: 16 , Issue: 8 , Aug. 2007 ) 2117 – 2128. [9] Ray Kurzweil K Reader Mobile User Guide, knfb Reading Technology Inc. (2008). [Online].Available: http://www.knfbReading.com [10] Ms.AthiraPanicker Smart Shopping assistant label reading system with voice output for blind using raspberry pi, Ms.Anupama Pandey, Ms.VrunalPatil YTIET, University of Mumbai ISSN: 2278 – 1323 International Journal of Advanced Research in Computer Engineering & Technology (IJARCET) Vol. 5, Issue 10, Oct 2016 2553 www.ijarcet.org [11] Raspberry pi 3b, Optical Character Recognition (OCR), Text to speech (TTS), Pi-camera, Speaker, Headphone [12] Gurav, Mallapa D., et al.”B-LIGHT:A Reading aid for the Blind People using OCR and OpenCV. ”International Journal of Scientific Research Engineering &Technology(IJSRET),ISSN(2017). [13] Goel, Anush, et al. \"Raspberry Pi Based Reader for Blind People.\" International Research Journal of Engineering and Technology 5.6 [14] Chaudhari, Harshada. \"Raspberry Pi technology: a review.\" International Journal of Innovative and Emerging Research in Engineering 2.3

Copyright

Copyright © 2023 Aparna.V. Mote, Rutuja Akhare, Vaishnavi Barde, Gitanjali Bhujbal. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Download Paper

Paper Id : IJRASET53004

Publish Date : 2023-05-25

ISSN : 2321-9653

Publisher Name : IJRASET

DOI Link : Click Here